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ABSTRACT 

Estimating the uncertainty on the matter power spectrum internally (i.e. directly from the data) 
is made challenging by the simple fact that galaxy surveys offer at most a few independent 
samples. In addition, surveys have non-trivial geometries, which make the interpretation of 
the observations even trickier, but the uncertainty can nevertheless be worked out within the 
Gaussian approximation. With the recent realization that Gaussian treatments of the power 
spectrum lead to biased error bars about the dilation of the baryonic acoustic oscillation scale, 
efforts are being directed towards developing non-Gaussian analyses, mainly from N-body 
simulations so far. Unfortunately, there is currently no way to tell how the non-Gaussian fea- 
tures observed in the simulations compare to those of the real Universe, and it is generally 
hard to tell at what level of accuracy the N-body simulations can model complicated non- 
linear effects such as mode coupling and galaxy bias. We propose in this paper a novel method 
that aims at measuring non-Gaussian error bars on the matter power spectrum directly from 
galaxy survey data. We utilize known symmetries of the 4-point function, Wiener filtering 
and principal component analysis to estimate the full covariance matrix from only four inde- 
pendent fields. We assess the quality of the estimated covariance matrix with a measurement 
of the Fisher information content in the amplitude of the power spectrum. With the noise 
filtering techniques and only four fields, we are able to recover the results obtained from a 
large N = 200 sample to within 20 per cent, for k < l.OAMpc . We further provide error 
bars on Fisher information and on the best-fitting parameters, and identify which parts of the 
non-Gaussian features are the hardest to extract. Finally, we provide a prescription to extract 
a noise-filtered, non-Gaussian, covariance matrix from a handful of fields in the presence of a 
survey selection function. 
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1 INTRODUCTION 

The matter power spectrum contains a wealth of information 
about a number of cosmological parameters, and measuring its 
amplitude with per cent level pre cision has b e come o ne of the main 
task of modern cosmolo gy I York et all 20001: 
20031: ISchlegel & others] 120071: 



Drinkwater et alj l20ld: 



LSST Dark Energy Science Collaboration] 20121 : iBemtez et al.l 
20091 Pan-STARRS0, DE^j). Cosmologists are especially in- 
terested in the detection of the Baryonic Acoustic Oscillation 
(BAO) scale, which allows to m easure the evolution of the 
dark energy equation of s t ate w(z) jg isenstein et al. 2005; Hutsil 



20061: [Tegmarketal 
Ande rson et al .1 120 12) 



l2006l ; |Perciv al et al. 2007; Blak e et alj|2oTll ; 
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Estimating the mean power spectrum from a galaxy sur- 
vey is a challenging task, as one needs to incorporate the sur- 
vey mask, model the redshift distortions, estimate the galaxy bias, 
etc . For this purpose, m any data analyses follow the prescriptions 
of iFeldman et al.l 



^— — j-i oi retuman et an i 

=r% e ., a n ' dVogelev & Szalavi 



1994) (FKP) or the Pseudo Karhunen-Loeve 
1996)(PKL hereafter), which provide unbiased 
estimates of the underlying power spectrum, as long as the ob- 
serve field is Gaussian in nature. When these methods are applied 
on a non-Gaussian field, however, the power spectr um estimator is 
no lo nger optimal, and the error about it is biased ( Tegma rk et al.l 
l2006h . 

As first discussed in iMeiksin & White! |l999) and 
iRimes & Hamilton (2005), Non-Gaussian effects on power 
spectrum measurements can be quite large; for instance, the Fisher 
information content in the matter power spectrum saturates in 
the trans-linear regime, which causes the number of degrees of 
freedom to deviate from the Gaussian prediction by up to three 
orders of magnitude. The obvious questions to ask, then, are 'How 
do non-Gaussian features propagate on to physical parameters, like 
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the BAO scale, redshift space distortions, neutrino masses, etc.?' 
and 'How large is this bias in actual data analyses, as opposed to 
simulations?' As a partial answer to the first question, it was shown 
from a large ensemble of N-body simulations that the Gaussian 
estimator, acting on non-Gaussian fields, produces error bars on 
the BAO dil ation scale that d iffer from full Gaussian case by up to 
15 per cent dNgan et alj|2012h . 

The first paper of this series, lHarnois-De raps & Pen! J2012t) 
(hereafter HDP1) addresses the second issue, and measure how 
large is the bias in the presence of a selection function. Starting with 
the 2dFGRS survey selection function as a study case, and mod- 
elling the non-Gaussian features from 200 N-body simulations, it 
was found that the difference between Gaussian and non-Gaussian 
error bars on the power spectrum is enhanced by the presence of 
a non-trivial surve y geometrjU The 15 per cent bias observed by 
iNean et alj J2OI2I) is, in that sense, a lower bound on the actual 
bias that exists in current treatments of the data. 

At this point, one could object that the sizes of the biases we 
are discussing here are very small, and that analyses with error bars 
robust to with 20 per cent are still in excellent shape and rather 
robust. However, the story reads differently in the context of dark 
energy, where the final goal of the global international effort is to 
minimize the error bars about w(z) by performing a succession of 
experiments with increasing accuracy and resolution. In the end, a 
20 per cent bias on the BAO scale has a quite large impact on the 
dark energy 'figure-of-merit', and removing this effect is the main 
goal of this series of paper. 

Generally, the onset of non-Gaussianities can be understood 
from asymmetries that develops in the matter fields subjected to 
gravitation, starti ng at the smallest scales and working their way up 
to larger scales dBernardeau et alj|2002l) . In Fourier space, Gaus- 
sian fields can be completely described by their power spectrum, 
whereas non-Gaussian fields also store information in higher mo- 
ments. For instance, the non-linear dynamics that describe the 
scales with k > 0.5/iMpc~' tend to couple the Fourier modes of 
the power spectrum, which effectively correlates the measurements. 
This correlation w as indeed found from very large samples of N- 
body simulations ( Taka hashi et ai]l2009l) . and act as to lower the 
number of degrees of freedom in a power spectrum measurement. 

One approach that was thought to minimizes these complica- 
tions consi sts in excluding most of the non-linear scales, as pro- 
posed in lSeo & Eisenstei nT d2003l) . However, this cuts out some of 
the BAO wiggles, thereby reducing our accuracy on the measured 
BAO scale. In addition, it is plausible that non-Gaussian features 
due to the non-linear dynamics, mask, and using simple Gaussian 
estimators on non-linear fields interact such as to impact scales as 
larch as k ~ 0.2/iMpc, as hinted by HDP1. Optimal analyses must 
therefore probe the signal that resides in the trans-linear and non- 
linear scales, and construct the power spectrum estimators based on 
known non-Gaussian properties of the measured fields. 

Characterization of the non-Gaussian features in the uncer- 
tainty about the matter power spectrum was recently attempted in 
HDP1, in which deviations from Gaussian calculations were pa- 
rameterized by simple fitting functions, whose best fitting param- 
eters were found from sampling 200 simulations of dark matter 

3 These error bars are unbiased, but unfortunately not optimal. As a matter 
of fact, the method still suffers from one of the inconvenience of the FKP 
formalism, namely that the convolution with the survey selection function 
effectively correlates the error bars. Removing this effect could be done by 
combining the methodology of HDP1 and that of PKL, but such a task is 
beyond the scope of this paper. 



particles. This was a first step in closing the gap that exists be- 
tween data analyses and simulations, but is incomplete in many 
aspects. First, it does not address the questions of cosmology and 
redshift dependence, halo bias nor of shot noise, which surely af- 
fect the best-fitting parameters, and completely overlooks the fact 
that observations are performed in redshift space. All these effects 
are crucial to understand in order to develop a self-consistent pre- 
scription with which we can perform non-Gaussian analyses in data 
(HDP3 in prep.). Second, and perhaps more importantly, this ap- 
proach assumes that the non-Gaussian features observed in N-body 
simulations are unbiased representations of those that exists in our 
Universe. Such a correspondence is assumed in any external error 
analyses, which involve mock catalogues typically constructed ei- 
ther from N-body simulations or from semi-analytica l techniques 
such as Log-Normal transfo rms dColes & Jones|[l99lh or PThalos 
dScoccimarro & Shethl2002l) . In that aspect, the FKP and PKL pre- 
scriptions are advantageous since they provide internal estimates 
of the error bars, hence avoid this issue completely. 

In this second paper (HDP2 hereafter), we set out to determine 
how well can we possibly measure these non-Gaussian features 
internally from a galaxy survey, including simple cases of survey 
masks. In that way, we are avoiding the complications caused by 
the cosmology and redshift dependence of the non-Gaussian fea- 
tures, and that of using incorrect non-Gaussian modelling of the 
fields. In many aspects, this question boils down to the problem 
of estimating error bars using a small number of measurements, 
with minimal external assumptions. In such low statistics envi- 
ronment, it has been sh own that the shrinkage estimation method 
dPope & Szapudlll2008l) can minimize the variance between a mea- 
sured covariance matrix and a target, or reference ma t rix. A lso 
important to mention is the attempt by [Hamilton et al. I d2006b to 
improve the numerical convergence of small samples by bootstrap 
resampling and re-weighting sub-volumes of each realizations, an 
approach that was unfortunately m ined down by the effect of beat 
coupling dRimes & Hamiltonll2006l) . 

With the parameterization proposed in HDP1, our approach 
has the advantage to provide physical insights on the non-Gaussian 
dynamics. In addition, it turns out that the basic symmetries of the 
four-point function of the density field found in HDP1 allow us to 
increase the number of internal independent subsamples by a large 
amount, with only a few quantifiable and generic Bayesian priors. 
In particular, it was shown that the contributions to the power spec- 
trum covariance matrix consist of two parts: the Gaussian zero-lag 
part, plus a broad, smooth, non-local, non-Gaussian component. In 
this picture, both the Gaussian and non-Gaussian contributions can 
be accurately estimated even within a single density field, because 
many independent configurations contribute. However, any attempt 
to estimate them simultaneously, i.e. without differentiating their 
distinct singular nature, results in large sample variance. 

This paper takes advantage of this reduced variance to opti- 
mize the measurement of the covariance matrix from only four N- 
body realizations. In such low statistics, the noise becomes domi- 
nant over a large dynamical range, hence we propose and compare a 
number of noise reduction techniques. In order to gain physical in- 
sights, we pay special attention to provide error bars on each of the 
non-Gaussian parameters and therefore track down the dominant 
source of noise in non-Gaussian fields. To quantify the accuracy 
of the method, we compare and calibrate our results with a larger 
sample of 200 N-body realizations, and use the Fisher information 
about the power spectrum amplitude as a metric of the performance 
of each noise reduction techniques. 

The issue of accuracy is entangled with a major aspect com- 
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mon to most non-Gaussian analyses: they require a measurement of 
the full power spectrum covariance matrix, which is noisy by na- 
ture. For instance, an accurate measurement of N 2 matrix elements 
generally requires much more then N realizations; it is arguable that 
N 2 independent measurements could be enough, but this statement 
generally depends on the final measurement to be carried and on 
the required level of accuracy that is sought. Early numerical cal- 
culati ons wer e performed on 400 re alizations dRimes & Hamilton! 
l2005h . while iTakahashi et alj d2009h have perform ed as many as 
5000 full N-body simulations. iNgan et a have opted for 

fewer realizations, but complemented the measurements with noise 
reductio n techniques, basical ly a principal component decomposi- 
tion (see lNorberg et al.ll2009L for another example). In any case, it 
is often unclear how many simulations are required to reach con- 
vergence; in this paper, we deploy strategies such as bootstrap re- 
sampling to assess the degree of precision of our measurements. 

We first review in section [2] the formalism and the theoret- 
ical background relevant for the estimations of the matter power 
spectrum and its covariance matrix, with special emphases on the 
dual nature and symmetries of the four point function, and on the 
dominant source of noise in our non-Gaussian parameterization. 
We then explain how to improve the measurement of C{k, k') in 
section [3] with three noise-reduction techniques, and compare the 
results with a straightforward bootstrap resampling. In section [4] 
we measure the Fisher information of the noise filtered covariance 
matrices, and compare our results against the large sample. Finally, 
we describe in section [5] a recipe to extract the non-Gaussian fea- 
tures of the power spectrum covariance in the presence of a survey 
selection function, in a low statistics environment, thus completely 
merging the techniques developed here with those of HDP1. This 
takes us significantly closer to a stage where we can perform non- 
Gaussian analyses of actual data. Conclusions and discussions are 
presented in the last section. 



2 ESTIMATION OF C(K,K'): NON-GAUSSIAN 
PARAMETERIZATION 

2.1 Power spectrum estimator 

In the ideal situation that exists only in N-body simulations - pe- 
riodic boundary conditions and no survey selection function effect 
- the matter power spectrum P(k) can be obtained in an unbiased 
way from Fourier transforms of the density contrast. The latter is 
defined as 5(x) = p(x)/p - 1, where p(x) is the density field and p 
its mean. Namely, 



<<5(k)<f (k')> = (27r)-V(k)<5 (k - k') 



(1) 



The Dirac delta function enforces the two scales to be identi- 
cal, and the bracket corresponds to a volume (or ensemble) av- 
erage. We refer the reader to HDP1 for details on our simula- 
tion suite, and briefly mention here that each of the N = 200 
realization is obtained by ev olving 256 3 particles with CUBEP 3 M 
dHarnois-Deraps et alj|2012l) down to a redshift of z = 0.5, with 
cosmological paramet ers consistent with the WMAP+BAO+SN 
five years data release dKomatsu etafl2 009). Simulated dark matter 
particles are assi gned onto a grid following a 'cloud in cell' interpo- 
lation scheme dHocknev & Eastw ood 1981), which is deconvolved 
in the power spectrum estimation. The isotropic power spectrum 
P(k) is finally obtained by taking the average over the solid angle. 

Observations depart from this ideal environment: they must 
incorporate a survey selection function and zero pad the bound- 




Figure 1. (top:) Power spectrum at z = 0.5, estimated from our template of 
200 N-body simulations (thin line) and from our 4 'measurements' (thick 
line). The error bars are the sample standard deviation. Throughout this 
paper, we represent the N = 4 and N = 200 samples with thick and thin 
solid lines respectively, unless otherwise mentioned in the legend or the 
caption, (bottom.) Fractional error with respect to the non-linear predictions 
of HALOFIT ISmith et all2003» . 



ary of the survey when constructing the power spectrum estima- 
tor, plus the galaxy positions are obtained in redshift space. We 
shall return to this framework in section [5] but for now, let focus 
our attention on simulation results. To quantify the accuracy of our 
measurements of P(k) and its covariance matrix in the context of 
low statistics, we construct a small sample by randomly selecting 
N = 4 realizations among the 200; we hereafter refer to these two 
samples as the N = 200 and the N = 4 samples. Fig. Q] shows the 
power spectrum as measured in these two sampl es, with a compar - 
ison to the non-linear predictions of HALOFIT dSmith et al]|2003l) . 
The simulations show a ~ 10 per cent positive bias compared to 
the predictions, a known systematic effect of the N-body code that 
happens when the simulator is started at very early redshifts. Start- 
ing later would remove this bias, but the covariance matrix, which 
is the focus of the current paper, becomes less accurate. This is an 
unfortunate trade-off that will be avoided in future production runs 
with the advent of an initial condition generator based on second 
order perturbation theory. The plotted error bars are smaller in the 
N = 4 sample, a clear example of bias on the error bars: the esti- 
mate on the variance is poorly determined, as expected from such 
low statistics, and the error on the N = 4 error bar is quite large. 
The mean P(k) measured in the two samples, however, agree at the 
per cent level. 

2.2 Covariance matrix estimator 

The complete description of the uncertainty about P(k) is contained 
in a covariance matrix, defined as : 



C(k,k') = <AP(£)AP(£')> 



(2) 



where AP(k) refers to the fluctuations of the power spectrum about 
the mean. If one has access to a large number of realizations, this 
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matrix can be computed straightforwardly, and convergence can 
be assessed with bootstrap resampling. In actual surveys, however, 
only a handful of sky patches are observed, and a full covariance 
matrix extracted from these is expected to be singular. 

To illustrate this, we present in Fig.[2]the covariance matrices, 
normalized to unity on the diagonal, estimated from the N = 200 
and N = 4 samples. While the former is overall smooth, we see that 
the latter is, in comparison, very noisy, especially in the large scales 
(low-k) regions. This is not surprising since these large scales are 
measured from very few Fourier modes, hence intrinsically have 
a much larger sample variance. It is clear why performing non- 
Gaussian analyses from the data is a challenging task: the covari- 
ance matrix is singular, plus the error bar about each element is 
large. This is nevertheless what we attempt to do in this paper, and 
in order to overcome the singular nature of the matrix, we need to 
approach the measurement from a slightly different angle. 

It was shown in HDP1 that there is an alternative way to esti- 
mate this matrix, which first requires a measurement of C(k, k',0), 
where 9 is the angle between the pair of Fourier modes (k, k'). We 
summarize here the properties of C(k, k', 9), and refer the reader to 
HDP1 for more details: 

• This four-point function receives a contribution from two 
parts: the degenerate singular configuration with all k vectors equal 
- the zero-lag point - and the smooth non-Gaussian component. 

• The zero-lag point corresponds to the Gaussian contribution 
and needs to be treated separately from the other points whenever 
k = k'. 

• As the angle approaches zero, the non-Gaussian component of 
C(k, k',9) increases significantly, especially when both scales are in 
the non-linear regime. 

• In the linear regime, most of the contribution comes from the 
zero-lag point, as expected from Gaussian statistics. 

As discussed in section 5.3 and in the Appendix of HDP1, 
C(k,k') can be obtained from an integration over the angular de- 
pendence of C(k, k',9): 

C(k, k' ) = dfiC(k, Id , 0)6 kk , + ^ C(k, k 1 , /z)dju 

= G(k, k') + NG(k, k') (3) 

where C(k, k',0) is the zero-lag point, and fi = cos$. Note that 
NG(k, k') refers to the 'Non-Gaussian' term, and should not be mis- 
read as NxG. Numerically, the difference between [Eq.[2) and [Eq. 
[3) is at the per cent level, at least in the trans-linear and non-linear 
regime^ 

2.3 k = k! case 

The break down of the covariance matrix proposed in [Eq.[3] opens 
up the possibility to explore which of the two terms is the easiest 
to measure, which is noisier, and eventually organize the measure- 
ment such as to take full advantage of these properties. The first 
term on the right hand side, G(k,k'), corresponds at the per cent 
level - at least in the trans-linear and non-linear regimes - to the 
Gaussian contribution 2P 2 (k)/N(k), where N(k) is the number of 

4 The agreement in the linear regime is reduced by discretization effects 
that cause the angles to be poorly determined, however this range is exactly 
where the theory is well understood. The overall strategy thus puts priors 
on the non-Gaussian features, requiring simply that these vanish for linear 
scales. 
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Figure 3. Comparison of the Gaussian term with the analytical expression. 
The top panel shows the ratio for the N = 200 sample, i.e. G20o(k)/C g (k), 
while the bottom panel shows the N = 4 ratio. In both cases, error bars are 
from bootstrap resampling. As explained in the text, the bias at low-k comes 
from from poor estimates of d/i in the lineal' regime. 



Fourier modes in the £-shell. This comparison is shown in Fig. [3] 
where the error bars are estimated from bootstrap resampling. In 
the N = 200 case, we randomly pick 200 realizations 500 times, 
and calculate the standard deviation across the measurements. In 
the N = 4 case, we randomly select 4 realizations 500 times, al- 
ways from the N = 200 sample. In the low-k regime, the simula- 
tions seem to underestimate the Gaussian predictions by up to 40 
per cent. This is not too surprising since the discretization effect 
is large there, and the angle between grid cell, dpi, are less accu- 
rate. However, it appears that a substitution G(k,k') — > C g would 
improve the results by correcting for this low-A: bias. 

We next look at the interplay between G and NG on the di- 
agonal of the covariance matrix (for the case where k = k'). We 
know from Fig. [3] that the Gaussian term is well measured and has 
a rapid convergence about C g (k). The left panel of Fig.|4]presents 
the diagonal components of the covariance matrix, divided by the 
Gaussian prediction, i.e. (G(k) + NG(k, k))/C g (k). The linear regime 
agrees well with the Gaussian statistics, then we observe strong de- 
viations about unity as the scales become smaller. In this figure, the 
error is again estimated from bootstrap resampling, even though the 
G + NG break down allows for more sophisticated error estimates 
(see section [3). 

An important observation is that the shape of the ratio is sim- 
ilar for both the large and small samples, which leads us to the 
conclusions that 1- departures from Gaussianities are clearly seen 
even in only four fields, and 2- both samples can be parameterize 
the same way. Following HDP1, we express the diagonal part of 
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Figure 2. (left:) Power spectrum cross-correlation coefficient matrix, estimated from 200 N-body simulations. As first measured byi lmes & HamiltonlfeOOSl) . 
there is a large region where the scales are correlated by 60-70 per cent. This is caused by the mode coupling that occurs in the non-linear regime of gravitational 
systems, (right:) Same matrix as left panel, but estimated from only 4 simulations. As expected from such a low number of measurements, the underlying 
structure of the matrix is partly masked by noise, especially in the low ir-modes where measurements are extracted from fewer Fourier modes. Note the 
difference in colour scale between the two panels. 



NG(k,k') and C(k,k') a§ 



NG(k, k) = )' and C(k, k)=^(l + (-*)") 

N(k) \k a ) N(k) \ \ko) I 



(4) 



In this parameterization, ko informs us about the scale at which 
non-Gaussian departure become important, and a is related to the 
strength of the departure as we progress deeper in the non-linear 
regime. The best-fitting parameters are presented in Table Q] An 
important result is that the two sets of parameters are consistent 
within la, which means that the N = 4 sample has enough informa- 
tion to extract the pair (a, ko), and therefore attempt non-Gaussian 
estimates of C(k, k'). The fractional error on both parameters is of 
the order of a few per cent in the large sample, and about 20-50 per 
cent in the small sample. A second important observation is that the 
fractional error about a is about twice smaller than that of k , which 
means that a is the easiest non-Gaussian parameter to extract. 

We are now in a position to ask which of G(k, k' ) or NG(k, k' ) 
has the largest contribution to the error on C(k, k). We present in 
the right panel of Fig.|4]the fractional error on both terms, in both 
samples. We scale the bootstrap error by V^V to show the sampling 
error on individual measurements, and observe that in both cases, 
the non-Gaussian term dominates the error budget by more than an 
order of magnitude. 



5 The notation is slightly different than in HDP1, which expressed 
C(*,*)=^(l + (ff), 
(k ,a). 



TTjjy^ Note the correspondance (a,fi) ■ 



2.4 ktk' case 

We now turn our attention to the off-diagonal part of the covari- 
ance matrix, whose sole contribution comes from the non-Gaussian 
term. For this reason, and because there are many more elements to 
measure from the same data (N 2 vs. AO, it is expected to be much 
noisier that the diagonal part. 

It is exactly this noise that makes C(k, k') singular, but luck- 
ily we can filter out a large part of it from a principal compo- 
nent analysis based on the Eigenvector decomposition of the cross- 
correlation coeffi cient matr i x r(k, k') = C(k, k')/ -\JC(k, k)C(k', k'). 
As discussed in iNgan etalJi2012h . this method improves the accu- 
racy of both the covariance matrix and of its inverse. HDP1 further 
provides a fitting function for the Eigenvector, and we explore here 
how well the best-fitting parameters can be found in a low statistics 
environment. 

This decomposition is an iterative process that factorizes the 
cross-correlation coefficient matrix into a purely diagonal compo- 
nent and a smooth symmetric off-diagonal part. The latter is further 
Eigen-decomposed, and we keep only the Eigenvector U(k) that 
corresponds to the largest Eigenvalue A. In that case, it is convenient 
to absorb the Eigenvalue in the definition of the Eigenvector, i.e. 
y/IU(k) -> U(k), such that we can write the r(k,k') ~ U(k)U(k') 
directly. Since the diagonal elements are unity by construction, the 
exact expression is r(k,k') = U(k)U(k') + <5«'[1 _ U 2 (k)]. This ef- 
fectively puts a prior on the shape of the covariance matrix, since 
any part of the signal that does not fit this shape is considered as 
noise and excluded. As shown in HDP1, this decomposition is ac- 
curate at the few per cent level in the dynamical range of interest, 
for the N = 200 sample. We present in the top panel of Fig. [5] the 
Eigenvector extracted from both samples, against a fitting function 
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Table 1. Best-fitting parameters of the diagonal component of the covariance matrix, parameterized as C(k, k) = G(k) + NG(k, k) = 2 ^^ |l + |^ j j, and for 

the principal Eigenvector, parameterized as U(k) = Ay ^ + lj . The error bars are obtained from bootstrap resampling and refitting the measurements. The 

Wiener filter and T-rotation techniques are described in sections l3Hl and lT2l respectrvelv. The parameters (a, ko) do not apply to the T-rotation, which acts only 
on the Eigenvector. 





ko 


a 


A 


h 


N = 200 sample 


0.24 ± 0.01 


2.24 ± 0.06 


1.027 ± 0.072 


0.07 ± 0.02 


N = 4 sample 


0.20 ± 0.09 


2.1 ±0.5 


1.00 ±0.03 


0.006 ± 0.02 


Wiener filtered 


0.19 


2.01 


1.00 


0.024 


T-rotation 


n/a 


n/a 


1.13 ±0.07 


0.09 ±0.13 


G{k) = 2P(k)/N(k) 


0.18 


2.13 


1.01 


0.021 



10" 2 



Eq. 2 

— • — Eq. 3 

Gauss 
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Figure 4. (top.) Ratio between the diagonal of the covariance matrix and 
the Gaussian analytical expression, i.e (G(k) + NG(k,k))/C g (k) in the N = 
200 sample. The measurements from [Eq.[5) are represented with the thin 
line, the thick solid line is from the fit, the dashed line is obtained from [Eq. 
|2) , and the horizontal dotted line is the linear Gaussian prediction. The error 
bars are from bootstrap resampling, (middle.) Same as top panel, but for the 
N = 4 sample, (bottom.) Fractional error on G and NG, scaled by V/V for 
comparison purposes. The thin and thick lines correspond to the N = 200 
and N = 4 sample respectively, and the calculations are from bootstrap 
resampling. It is obvious from this figure that the Gaussian term is much 
easier to measure than NG. 



other hand, k { is much harder to constrain: in the N = 200 sample, 
the fractional error bar is about 30 per cent, and in the N = 4, 
the error bar is three times larger than its mean. In other words, 
k\ is consistent with zero within 1 a in the small sample, which 
corresponds to a constant Eigenvector. The non-Gaussian noise is 
thus mostly concentrated here, and improving the measurement on 
k\ is one of the main tasks of this paper. 

If we model the N = 4 covariance matrix with this fit to the 
Eigenvector, the matrix no longer singular, as shown in Fig. [6] It 
does exhibit a stronger correlation at the largest scales, compared 
to the matrix estimated from the large sample. This difference roots 
in the fact that the N = 4 Eigenvector remains high at the largest 
scales, whereas the N = 200 vector drops. There is thus an over- 
estimate of the amount of correlation between the largest scales, 
which biases the uncertainty estimate on the high side. In any case, 
these large scales are given such a low weight in the calculation of 
Fisher information - they contain a small number of independent 
Fourier modes - that their contribution to the Fisher information is 
tiny. Figure [6] also shows that we do recover the region where the 
cross-correlation coefficient is 60-70 per cent, also seen in the left 
panel of Fig. [2] It is thus a significant step forward in the accuracy 
of the error estimate compared to the Gaussian approach. 

We show in sec tions [3~Tl and [3~!2l that noise filtering techniques 
are able to reduce this bias down to a minor effect, even when work- 
ing exclusively with the same four fields. To contrast the pipeline 
presented in this section (of the form [/V = 4 — > PCA — » fit ]) 
to those presented in future sections, we hereafter refer to this ap- 
proach as the 'N = 4 naive' way. 



of the fornB 



In this parameterization, A represents the overall strength of the 
non-Gaussian features, while k\ is related to the scale where cross- 
correlation becomes significant in our measurement. If A = 0, then 
we recover r(k,k') = 6kk', which corresponds to 'no mode cross- 
correlation'. The best fitting parameters are shown in Table [T] We 
observe that the error bar on A is at the per cent level, and that the 
measurements in both samples are both very close to unity. On the 

6 This parameterization of U(k) is equivalent to that of HDP1, but signif- 
icantly simplified: we factorize y outside of the parenthesis, and we set 
<5=1. This choice is motivated by the fact that in the bootstrap estimate, the 
error bar on S was at the fraction of a per cent, indicating a weak dependence 
of the fit on this variable. 



3 OPTIMAL ESTIMATION OF C(K, K'): BEYOND 
BOOTSTRAP 

In the calculations of section[2] we show that even in low statistics, 
we can extract four non-Gaussian parameters with the help of noise 
filtering techniques that assume a minimal number of priors on the 
non-Gaussian features. As mentioned therein, all the error bars are 
obtained from bootstrap resampling, which is generally thought to 
be a faithful representation of the underlying variance only in the 
large sample limit. How, then, can we trust the significance of our 
results in the N = 4 sample, which emulates an actual galaxy sur- 
vey? More importantly, can we do better than bootstrap? In this 
section, we expose new procedures that optimize the estimate of 
both the mean and the error on C(k,k'), and we quantify the im- 
provements on all four non-Gaussian parameters (a, A, ko and k{). 



Non-Gaussian Error Bars in Galaxy Surveys -2 7 




k[h/Mpc] 



Figure 5. Main Eigenvector extracted from the different estimates of the cross-correlation coefficient matrix. As explained in the text, the Eigenvalue is 
absorbed in the vector to simplify the comparison, (top:) The thin solid line (with error bars) and the thick solid line (with grey shades) represent the N = 200 
and N = 4 measurements respectively. Since no other noise reduction technique are applied to extract these vectors, we refer to this N = 4 estimate as the 
'naive' estimate. The error bars are from bootstrap, and the fits to these curves are provided with the parameters of Table[T]and represented by the dot-dashed 
lines, (bottom.) Ratio between the main Eigenvector extracted from three noise reduction techniques, and that of the N = 200 sample ( shown in the upper 
panel). The G(k) — » 2P 2 (k)/N(k) substitution is described in section l2~3l while the Wiener filter and T-rotation approaches are described in sections [TT] and 
I3.2l respectivelv. Except for the first bin, the T-rotation technique recovers the N = 200 Eigenvector to within 20 per cent for k < 0.2/iMpc~' ; all techniques 
achieve per cent level precision for smaller scales, due to the higher number of modes per /:-shell. 



3.1 Wiener filtering 

Let us recall that in the bootstrap estimate of the error on C(k, k'), 
we resample the measured C(k, k', 9), integrate over the angle 6 and 
add up the uncertainty from different angles - including the zero- 
lag point - in quadrature. We propose here a different approach, 
based on our knowledge that C(k, k',6) is larger for smaller angles. 
At the same time, we take advantage of the fact that the noise on 
G(k) is much smaller than that on NG(k,k'). Since the mean and 
error should be given more weight in regions where the signal is 
cleaner, we replace the quadrature by a noise-weighted sum in the 
angular integration of C(k, k',6). This replacement reduces the error 
on NG by an order of magnitude or so, depending on the scale (see 
Appendix [A] for details). 

At this stage, we now have accurate estimates of the error 



on the covariance C(k,k') in both the N = 4, plus accurate mea- 
surements of the signal and noise of the underlying covariance ma- 
trix from the N = 200 sample, which we treat as a template. The 
technique we describe here is a Wiener filtering approach that uses 
known noise properties of the system in order to extract a signal that 
is closer to the template. The error on G(k) is obtained from boot- 
strap resampling the zero-lag point, while the error on NG(k,k') 
comes from the noise-weighted approach mentioned above and dis- 
cussed in Appendix [A] We first apply the filter on both quantities 
separately, and then combine the results afterward. Namely, we de- 
fine our Wiener filters as 



Cwf - Gwf + NGwf (6) 
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Figure 6. Power spectrum cross-con'elation coefficient matrix, estimated 
from only 4 simulations, and extracted from a fit to the principal Eigenvec- 
tor (the so-called 'naive' way). No other noise reduction techniques are used 
in this particular calculation, and we see that this method alone removes a 
lot of the noisy features in the elements. However, there is a positive bias 
in the large scales (lower-left region) compared to the large sample (see 
left panel of Fig. [2j, which is caused by a discrepancy in the Eigenvec- 
tor estimate between the N = 4 and the N = 200 samples at the lowest 
&-modes. Fortunately, this region has very little impact on the Fisher infor- 
mation since it contains very few Fourier modes. 



with 

Gwf = Gioo + (G 4 - G 2 oo) ( — — ~°° ; j (7) 
V ^200 - °"4 / 

and 

NGwf = NG 100 + (NG 4 - NG 200 )i 2 °" 2 °° 2 | (8) 

Note that the errors that appear in the above two expressions corre- 
spond to the estimates from the N = 200 and N = 4 samples on G 
and NG respectively. 

We present in Fig.[7]the Wiener filtered variance on P(k), com- 
pared to the N = 200 and N = 4 samples. We observe that the 
filter has a mild effect on the ranee 0.08 < k < 0.3/tMpc" 1 , but 
then the noise decreases by almost a factor of two compared to the 
original N = 4 sample. There is a positive bias that appears for 
k > 1.0/iMpc , therefore results beyond that point should be ex- 
cluded from the analyses. The full Wiener filtered cross-correlation 
matrix is presented in Fig.[8]and shows that some of the noise has 
been filtered out. The regime k > 0.5/iMpc~' is smoother than the 
original N = 4 measurement, and except for the largest two modes, 
the matrix is brought closer to the N = 200 sample. 

The next steps consist in computing the new Eigenvector U (k) 
constructed with C WF and to find the new best-fitting parameters 
(a, ko, A, ki). The bottom panel of Fig.|5]presents U(k) and shows 
that most of the benefits are seen for 0.08 <k< 0.2/7Mpc _1 , which 
is the range we targeted to start with, and the best-fitting parameters 
are tabulated in Table[T] Overall, the improvement provided by the 
Wiener filter is still hard to gauge by eye from Fig. [8] because the 
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Figure 7. Comparison between the variance about P(k) with and without 
the Wiener filtering technique. Results are expressed as the fractional error 
with respect to the N = 200 sample, acting as a signal template in our filter. 
This plot shows that Wiener filtering can reduce the noise by almost a factor 
of two on the diagonal elements. 
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Figure 8. Cross-correlation matrix after the Wiener filter has been applied. 
This matrix is to be compared with both panels of Fig. [2] There is still a 
significant amount of noise, and it is hard to see the improvement by eye. 



Eigenvector is still very noisy. The method is mostly efficient on 
the diagonal part, where we can take advantage of the low noise 
level of the Gaussian term. 
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3.2 Noise-weighted Eigenvector decomposition 

In this section, we describe a last noise filtering technique that uti- 
lizes known properties of the noise about the Eigenvectors and their 
Eigenvalues to improve the way we perform the principal compo- 
nent decomposition in the N = 4 sample. It is a general strategy that 
could be combined with others techniques described in the preced- 
ing sections, however, for the sake of clarity, we only present here 
the standalone effect. 

In the Eigen-decomposition, not all Eigenvalues are measured 
with the same precision. For instance, most of the covariance ma- 
trix can be described by the first Eigenvector U(k), hence we ex- 
pect the signal-to-noise ratio about its associated Eigenvalue to be 
the largest. Considering again the N = 200 sample as our template 
of the underlying covariance, the error on each A can be obtained 
by bootstrap resampling the 200 realizations. We present this mea- 
surement in Fig. [9] where we observe that the first Eigenvalue is 
more than an order of magnitude larger than the others, which are 
also noisier. 

Since general Eigenvector decompositions are independent of 
rotations, our strategy is to rotate the N = 4 cross-correlation matrix 
into a state where it is brought closer to the template, then apply a 
signal-to-noise weight before the Eigenvector decomposition. More 
precisely, we apply the following algorithm, which we refer to as 
the 'T-rotation' method in the rest of this paper: 

(i) Rotate the noisy (i.e. N = 4) cross-correlation coefficient ma- 
trix in the Eigenstates T of the template 

(ii) Weight the elements by the signal-to-noise ratio of the cor- 
responding Eigenvalues 

(iii) Perform an Eigenvector decomposition on the resulting ma- 
trix 

(iv) Undo the weighting 

(v) Rotate back 



The rotation in step (i) is defined as: 
R = T'pT 



(9) 



and, by construction, reduces to the diagonal Eigenvalues matrix in 
the case where p is the template cross-correlation coefficient ma- 
trix. The weighting in step (ii) is performed in two part.^: 1- we 
scale each matrix element Rjj by 1 / JAjAj, and 2- we weight the re- 



sult by the signal to noise ratio of each A. Combining, we defin^|: 



DijsR tj 



A.Aj 



i RijWiWj 



(10) 



As seen in Fig. [9] the Eigenvalues drop rapidly, and we expect only 
the first few to contribute to the final result. In fact, our results 
present very small variations if we keep anywhere between two 
and six Eigenvalues and exclude the others. Since it was shown 
in HDP1 that in some occasions, we need up to four Eigenvectors 
to describe the observed C(k,k') matrix, we choose to keep four 
Eigenvalues as well, and cut out the contributions from /1,>4. 

In step (iii) the resulting matrix D is decomposed into Eigen- 
vectors S: 



D = S/l D S- 



(11) 



7 The first part takes the ^template matrix into the identity matrix, and allows 
the second part to done with the correct units. 

8 This matrix is not to be confused with the fluctuation matrices of [Eq. 
|A2). 




Figure 9. Signal and noise of the Eigenvalues measured from the N = 200 
sample. The error bars are obtained from bootstrap resampling the power 
spectrum measurements. Only the four largest Eigenvalues are kept in the 
analyses. 



Then, in step (iv), these Eigenvectors are weighted back by absorb- 
ing the weights directly: 

S~ij = Sij/wi (12) 

The result is finally rotated back into the original space: 

p = fA D T~i with f = TS (13) 

If no cut is applied on the Eigenvalues, this operation essentially 
does nothing to the matrix, as the equivalence between p and p 
is exact: every rotation and weights that are applied are removed, 
and we get U(k) = f[(k). However, the cut, combined with the 
rotation and weighting, acts as to improve the measurement of the 
Eigenvector. 

We present in the bottom panel of Fig. [5] the effect of this T- 
rotation on the original TV = 4 Eigenvector. We observe that it traces 
remarkably well the N = 200 vector, to within 20 per cent even at 
the low /r-modes, and outperforms the other techniques presented 
in this paper in its extraction of U (k). The best fitting parameters 
corresponding to T\{k) are summarized in Table [T] We discuss in 
section [5] how, in practice, one can us these techniques in a real 
survey. 



4 IMPACT ON FISHER INFORMATION 

In analyses based on measurements of P(k), the uncertainty typi- 
cally propagates t o cosmological parameters within the formalism 
of Fisher matrices dTegmarkll997l) . The Fisher information content 
in the amplitude of the power spectrum, defined as 



(14) 



effectively counts the number density of degrees of freedom in a 
power spectrum measurement. In the above expression, C norm is 
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simply given by C(k, k')/[P(k)P(k')]. The Gaussian case is the sim- 
plest, since the covariance is given by C g (k) = 2P 2 (k)/N(k), where 
N(k) is the number of cells in the /r-shell. We recall that the factor of 
two comes in because the P(-k) = P(k) symmetry, which reduces 
the number of independent elements by a factor of two. Also, C norm 
reduces to 2/N(k), and I{k max ) = ^ t N(k)/2 for Gaussian fields. 

We see how I(k) is an important intermediate step to the full 
Fisher matrix calculation, as it tells whether we can expect an im- 
provement on the Fisher informatio n from a given increase in sur- 
vey resolution. It was first shown bv lRimes &Hamiltonl J2005l) that 
the number of degrees of freedom increases in the linear regime, 
following closely the Gaussian prescription, but then reaches a 
trans-linear plateau, followed by a second increase at even smaller 
scales. This plateau was later interpret ed as a transition be tween 
the two-haloes and the one-halo term jNevrinck et l2O06h . and 
corresponds to a regime where the new information is degenerate 
with that of larger scales. By comparison, the Gaussian estimator 
predicts ten times more degrees of freedom by k ~ 0.3/?Mpc~'. 

What stops the data analyses from performing fully non- 
Gaussian uncertainty calculations is that the Fisher information re- 
quires an accurate measurement of the inverse of a covariance ma- 
trix similar to that seen in Fig. [2] which is singular. With the noise 
reduction techniques described in this paper, however, the covari- 
ance matrix is no longer singular, such that the inversion is finally 
possible. To recapitulate, these techniques are: 

(i) The 'Naive N = 4 way' : straight Eigenvector decomposition 
+ fit of the N = 4 sample (section l2l4l 

(ii) Same as (i), with the G(k) — > substitution (section [23t 

(iii) Wiener filtering of G(k) and NG(k, k') (section [3TTT l 

(iv) T-rotation (section [3~2l 

In this section, we assume that there are no survey selection func- 
tion effects, and that the universe is periodic. We discuss more re- 
alistic cases in section|5] 

We present in the top panel of Fig. [10] the Fisher informa- 
tion content for each technique, compared to that of the template 
and the analytical Gaussian calculation. We also show the results 
for the N = 200 sample after the Eigen-decom position, which i s 
our best estimator of the underlying information dNgan et al.l2012h . 
The agreement between this and the original information content in 
the N = 200 sample is at the few per cent level for k < 1.0/iMpc -1 
anyway. We do not show the results from the N = 4 sample, nor that 
after the Eigenvector decomposed only, as these quickly diverge. It 
actually is the fitting procedure that clean up most of the noise. In 
all these calculations, the error bars are obtained from bootstrap re- 
sampling. The bottom panel represents the fractional error between 
the different curves and our best estimator. 

As first found by iRimes & Hamilton! d2005h . the deviation 
from Gaussian calculations reaches an order of magnitude by k ~ 
0.3/j/Mpc, and increases even more at smaller scales. With the fit 
to the Eigenvector (technique (i) in the list above mentioned), we 
are able to recover a Fisher information content much closer to the 
template; it underestimates the template by less than 20 per cent for 
k < 0.3/iMpc" 1 , and then overestimates the template by less than 60 
per cent away for 0.3 < k < 1 .0/iMpc~' . The analytical substitution 
of G{k) (technique (ii)) has even better performances, with maxi- 
mum deviations of 20 per cent over the whole range. The Wiener 
filter (iii) is not as performant, but still improves over the naive 
N = 4 way, with the maximal deviation reduced to less than 45 per 
cent. The T-rotation (iv) also performs well, with deviations by less 
than 20 per cent. Most of the residual deviations can be traced back 
to the fact that in the linear regime, the covariance matrix exhib- 



ited a large noise that we could not completely remove. This extra 
correlation translates into a loss of degrees of freedom in the linear 
regime, a cumulative effect that biases the Fisher information con- 
tent on the low side. Better noise reduction techniques that focus in 
the large scales cleaning could outperform the current information 
measurement. In any case, this represents a significant step forward 
for non-Gaussian data analyses since internal estimates of l(k) can 
be made accurate to within 20 per cent over the whole BAO range, 
even from only four fields. 



5 IN THE PRESENCE OF A SURVEY SELECTION 
FUNCTION 

The results from section[4]demonstrate that it is possible to extract 
a non-Gaussian covariance matrix internally, from handful of ob- 
servation patches, and that with noise filtering techniques, we can 
recover, to within 20 per cent, the Fisher information content in 
the amplitude of the power spectrum of (our best estimate of) the 
underlying field. The catch is that these are derived from an ideal- 
ized environment that exist in only in N-body simulations, and the 
objective of this section is to understand how, in practice, can we 
apply the techniques in actual data analyses. We explore a few 
simple cases that illustrates how the noise reduction techniques can 
be applied, and how the non-Gaussian parameters can be extracted 
in the presence of a survey selection function W(x). 

5.1 Assuming deconvolution of W(k) 

The first case we consider is the simplest realistic scenario one can 
think of, in which the observation patches are well separated, and 
for each of these the survey selection functions can be successfully 
deconvolved from the underlying fields. For simplicity, we also as- 
sume that each patch is assigned onto a cubical grid with constant 
volume, resolution and redshift, such that the observations combine 
essentially the same way as the N = 4 sample presented in this pa- 
per. Once the grid is chosen, one then needs to produce a large sam- 
ple of realizations from N-body simulations, with the same volume, 
redshift and accuracy, and construct the equivalent of our N = 200 
sample. 

(i) In the 'naive N = 4' way, one needs to compute the (noisy) 
covariance matrix from the data sample, compute the ratio of the di- 
agonal to the Gaussian predictions and fit, then compute the cross- 
correlation coefficient matrix p(k, k'), Eigen-decompose and fit the 
main Eigenvector U(k). The noise filtered estimate of the covari- 
ance is recovered by combining the fitting functions at the centre 
of the fc-bins for the ratio and U(k), using p(k, k') = U(k)U(k') and 
C(k, k') = p(k, k') ^C(k,k)C(k,k'). 

(ii) To construct the Wiener filter described in section 13.11 one 
needs to use the methodology of HDP1 and compute C(k,k',ff) 
from the density fields of the data and the large simulated sample, 
and finally extract G{k) and NG(k,k') from [Eq.[3] to construct the 
filter. At that stage, it is also trivial to try the semi-analytical substi- 
tution G(k) 2P 2 (k)/N(k) and reduce the noise even more. After 
that, we need to compute the ratio and U (k) as described above, find 
the best-fitting parameters, and reconstruct the covariance matrix. 



9 As mentioned in the introduction of this paper, the key missing ingredient 
in this paper is the inclusion of redshift distortions, which will be the core 
subject of the third paper of this series. 
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Figure 10. (top:) Fisher information extracted from various noise filtering techniques, compared to Gaussian calculations. Results from the original N = 200 
sample are shown by the thin solid line, with error bars plotted as grey shades; calculations from the main Eigenvector of the N = 200 matrix are shown by 
the dashed line, with error bars; results from fitting directly the main Eigenvector of the N = 4 sample (see section |2~4l are shown by the thick solid line; the 
effect of replacing G(k) — > 2P(k)/N(k) is shown by the thick dashed line (see section l2~3l ; the effect of Wiener filtering the covariance matrix is shown by the 
crosses (see section |3Tt ; finally, the noise-weighted T-rotation calculations are shown by the open circles (see section |T2"V (bottom:) Fractional error between 
each of the top panel curves and the measurements from the main N = 200 Eigenvector. 



(iii) To make use of the T-rotation technique, one needs to com- 
pute, from the large simulated sample, the Eigenvectors that de- 
scribe the cross-correlation coefficient matrix, plus the noise about 
each Eigenvalue, which can be obtained from bootstrap resampling 
the simulated realizations. The weights w, can then be computed, 
and the rest of the technique follows directly from section [3l2l such 
that we end up with a better estimate of U (k) - to be fitted as well. 
This technique improves only the estimate of p, so that one then has 



some freedom regarding which estimate of the diagonal element to 
choose (fit to the raw data, the Wiener filtered, etc.). 



5.2 Assuming no deconvolution of W(k) 

The second case is a scenario in which the observation mask was 
not deconvolved from the underlying field. This set up introduces 
many extra challenges, as the mask tends to enhance the non- 
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Gaussian features, hence, for simplicity, we assume that there is a 
unique selection function W(k) that covers all the patches in which 
the power spectra are measured. Let us first recall that in presence 
of a selection function, the observed power spectrum of a patch 
is related to the underlying one via a convolution with the Fourier 
transform of the mask, namely: 

P obs (k= f nk')|W(k'-k)| 2 dk' (15) 

The first paper of this series describes a general extension 
to the FKP calculation in which the underlying covariance ma- 
trix C(k, k') is non-Gaussian. Specifically, the 'observed' covari- 
ance matrix C j s (k, k') is related to the underlying one via a six- 
dimensional convolution: 

C obs (k,k')oc £ C(k",k'")|W(k-k")| 2 |W( k '- k '")| 2 ( lf >) 

k".k"' 

In HDP1, C(k, k') is calculated purely from N-body simula- 
tions, therefore it is known a priori; only P(k) and W(x) are ex- 
tracted from the survey. It is in that sense that the technique of 
HDP1 provides an external estimate of the error. What needs to be 
done in internal estimates is to walk these steps backward: given a 
selection function and a noisy covariance matrix, how can we ex- 
tract the non-Gaussian parameters of Table [T]? Ideally, we would 
like to deconvolve this matrix from the selection function, but the 
high dimensionality of the integral in rEq. 1161 makes the brute force 
approach numerically not realistic. 

There is a solution, however, which exploits the fact that many 
of the terms involved in the forward convolution are linear. It is thus 
possible to perform a least square fit for some of the non-Gaussian 
parameters, knowing C Q ^ S and W(x). We start by casting the under- 
lying non-Gaussian covariance matrix into its parameterized form, 
which expresses each of its Legendre multipole matrices C' ( J into 
a diagonal and a set of Eigenvectors (see [Eq. 51-54] and Tables 
1 and 2 of HDP 10). We recall that only the I = 0, 2, 4 multipoles 
contain a significant departure from the Gaussian prescription; the 
complete matrix depends on 6 parameters to characterize the diag- 
onal components, plus 30 others to characterize the Eigenvectors. 
In principle, these 36 parameters could be found all at once by a 
direct least square fit approach. However, some of them have more 
importance than other, and, as seen in this paper, some of them are 
easier to measure, therefore we should focus our attention on them 
first. In particular, it was shown in figure 22 of HDP1 that most 
of the non-Gaussian deviations come from Co, hence we start by 
solving only for its associated parameters. 

To simplify the picture even more, we decompose the problem 
one step further and focus exclusively on the diagonal component. 
In this case, we get, with the notation of the current paper: 

C(k,k',0) ~ C (k,k',n) = C G (fc)4*'(<5„i + ) (17) 

The 'tilde' symbol serves to remind us that this is not the complete 
£ = multipole but only its diagonal. The term with the d^i is the 

10 In the following discussion, we refer substantially to sections 7.2, 8 
and 8.1 of HDP1, and we try to avoid unnecessary repetitions of lengthy 
equations here. Also, as discussed in section l2~4l some of the parameters in 
Table 2 of HDP 1 are degenerate, and in fact only 30 are necessary. For each 
multipole, the main Eigenvector requires only A and ki, and we can merge 
A into a for the others. 



Gaussian contribution, and yields to ([Eq. 56] of HDP1): 

C° G bs (k,k') = £ C G (k")\W(k" - k)\ 2 \W(k" - k'f (18) 

k" 

For the second term, the 5tw allows us to get rid of one of the radial 
integral in [Eq. 1161 , and the remaining part is isotropic, therefore 
the angular integrals only affect the selection functions. These in- 
tegrals can be precomputed as the X(k, k") function in [Eq. 57] of 
HDP1, with w(6",(f>") = 1, and we get 

cf\k,k') = cf\k,k') + £ (^)"(x(k,r))(*(k',r)) (19) 

where the angle brackets refer to an average over the angular de- 
pendence. At the end of this calculation, we obtain, for each (k, k') 
pair, a value for a and /?, and all that is left is to find the parameter 
values that minimize the variance. 

The next step is to include the off-diagonal terms of the I = 
multipole, in which case the full Co is modelled. The underlying 
covariance matrix is now parameterized as 

C(k, k\6)~ C (k, k',p) = C (k, k',ji) + H (k, k') (20) 
where 

H (k,k') = (F (k)F (k') - Fl(k)S(k - #)) (21) 
and 

F (k) = U(k) ^l+IUjcoW (22) 
In this case, [Eq. |19l is modified and we get 

C° hs (k,k') = C° b '(k,k')+ ^ H (k'\k'")(x(k,k'')j(x(k',k''')j (23) 

k" ,k" 

There are two new best-fitting parameters that need to be found 
from Hq, namely A and k\, and we can use our previous results on 
a and ko as initial values in our parameter search. Since the X func- 
tions only depend on W(k), we can still solve these A' 2 equations 
with a non-linear least square fit algorithm and extract these four 
parameters. 

Including higher multipoles can be done with the same strat- 
egy, i.e. progressively finding new parameters from least square 
fits, using the precedent results as priors for the higher dimensional 
search. One should keep in mind that the convolution with C2 and 
d becomes much more involving, since the number of distinct X 
functions increases rapidly, as seen in Table 4 of HDP 1. In the end, 
all of the 36 parameters can be extracted out of N 2 matrix elements, 
in which case we have fully characterized the non-Gaussian prop- 
erties of the covariance matrix from the data only. 

The matrix obtained this way is expected to have very little 
noise, as this goes straight for the fitting functions. In principle, 
assuming that the operation was loss-less, the recovered covariance 
matrix would be completely equivalent to the naive N = 4 way, 
had the selection function been deconvolved first. It could possible 
to improve the estimate even further by attempting the T-rotation 
on the output, as explained is section [5~T| but since we do not have 
access to the underlying density fields, the Wiener filter technique 
is not available in this scenario. 

6 DISCUSSION 

With the recent realization that the Gaussian esti mator of error bars 
on the BAO scale is biased by at least 15 per cent ( iNgan et aljioil 



Non-Gaussian Error Bars in Galaxy Surveys -2 13 



lHarnois-Deraps & Penl2012h . efforts must be placed towards incor- 
porating non-Gaussian features about the matter power spectrum in 
data analyses pipeline. This implies that one needs to estimate ac- 
curately the full non-Gaussian covariance matrix C(k, k') and, even 
more challenging, its inverse. The strategy of this series of paper is 
to address this bias issue, via an extension of the FKP formalism 
that allows for departure from Gaussian statistics in the estimate of 
C(k, k'). The goal of this paper is to develop a method to extract di- 
rectly from the data the parameters that describe the non-Gaussian 
features. This way, the method is free of the biases that affect ex- 
ternal non-Gaussian error estimates (wrong cosmology, incorrect 
modelling of the non-Gaussian features, etc.). 

We emulate a typical observation with a subset of only N = 4 
N-body simulations, and validate our results against a larger N = 
200 sample. The estimate of C(k, k') obtained with such low statis- 
tics is very noisy by nature, and we develop a series of independent 
techniques that improve the signal extraction: 

(i) We break down the full matrix into a diagonal and off- 
diagonal component, extract the principal component of the latter, 
and find best-fitting parameters based on general trends that are 
known from the N = 200 sample, 

(ii) We write down the full matrix as a sum of a Gaussian com- 
ponent G(k)Stk' and a non-Gaussian component NG(k,k') and re- 
place the former by the analytical prediction, 

(iii) We optimize the uncertainty on the smooth non-Gaussian 
component from a noise- weighted sum over the different angular 
contributions, and apply a Wiener filter on the resulting covariance 
matrix, 

(iv) We rotate the noisy covariance matrix into the Eigenspace 
of the large sample and weight the elements by the signal-to-noise 
properties of the Eigenvalues. 

These techniques are exploiting known properties about the 
noise, and assume a minimal number of priors about the signal. 
We quantify their performances by comparing their estimate of the 
Fisher information content in the matter power spectrum to that of 
the large N = 200 sample. We find that in some cases, we can 
recover the signal within less then 20 per cent for k < 1.0/iMpc . 
We also provide error bars about the Fisher information whenever 
possible. By comparison, the Gaussian approximation deviates by 
more than two orders of magnitude at that scale. 

We find that the diagonal component of NG(k, k') is well mod- 
elled by a simple power law, and that in the N = 4 sample, the 
slope a and the amplitude ko can be measured with a signal-to- 
noise ratio of 4.2 and 2.2 respectively. The off-diagonal elements 
of NG(k, k') are parameterized by fitting the principal Eigenvector 
of the cross-correlation coefficient matrix with two other parame- 
ters: an amplitude A, which has a signal-to-noise ratio of 32.1, and 
a turnaround scale jfci, which is measured with a ratio of 0.30 and 
is thus the hardest parameter to extract. Even in the large N = 200 
sample, k\ is measured with a ratio of 3. 1 , which is five to ten times 
smaller than the other parameters. This help us to better understand 
the parts of the non-Gaussian signals that are the noisiest, such that 
we can focus our efforts accordingly. 

We then propose a strategy to extract these parameters directly 
from surveys, in the presence of a selection function. We explore 
two simple cases, in which a handle of observation patches of equal 
size, resolution and redshift are combined, with and without a de- 
convolution of the survey selection function. The first case is the 
easiest to solve, since the deconvolved density fields are in many 
respect similar to the simulated N = 4 sample that is described in 
this paper. We show that it is possible to apply all of the four noise 



filtering techniques described above to optimize the estimate of the 
underlying covariance matrix. 

In the second case, we explore what happens when decon- 
volution is not possible. We propose a strategy to solve for the 
non-Gaussian parameters with a least square fit method, knowing 
Cobs(k,k') and W(k). The approach is iterative in the sense that 
it first focuses on the parameters that contribute the most to the 
non-Gaussian features, then source the results as priors into more 
complete searches. 

The main missing ingredient from our non-Gaussian parame- 
terization is the inclusion of redshift space distortions, which will 
be the focus of the next paper of this series. To give overview of the 
challenge that faces us, the approach is to expand the redshift space 
power spectrum into a Legendre series, and to compute the covari- 
ance matrix term by term, again in a low statistics environment. 
Namely, we start with P{k,n) = Po{k)+P 1 {k)P 2 (^)+PA{k)P^)+ ... 
where Pj(k) are the multipoles and 'Pci/i) the Legendre polynomi- 
als, and compute the nine terms in C(k,/i,k' ,yu') = (Po(k)Po(k')) + 
(Po(k)P 2 (k'))'P2Q J ) + ■■■ We see here why a complete analysis needs 
to include the auto- and cross-correlations between each of these 
three multipoles, even without a survey selection function. 

As mentioned in the introduction, many other challenges in 
our quest for optimal and unbiased non-Gaussian error bars are 
not resolved yet. For instance, many results, including all of the 
fitting functions from HDP1, were obtained from simulated parti- 
cles, whereas actual observations are performed from galaxies. It is 
thus important to repeat the analysis with simulated haloes, in order 
to understand any differences that might exist in the non-Gaussian 
properties of the two matter tracers. In addition, simulations in both 
HDP1 and in the current paper were performed under a specific 
cosmology, and only the z = 0.5 particle dump was analyzed. Al- 
though we expect higher lower redshift and higher Sl m cosmolo- 
gies to show stronger departures form Gaussianities - clustering 
is stronger - we do not know how exactly this impact each of the 
non-Gaussian parameters. 



7 CONCLUSION 

We have developed techniques that allow for measurements of non- 
Gaussian features in the power spectrum uncertainty for galaxy sur- 
veys. We separate the contributions to the total correlation matrix 
into two kinds: diagonal and off-diagonal. The off-diagonal is ac- 
curately captured in a small number of eigenmodes, and we have 
used the eigenframes of the N-body simulations to optimize the 
measurements of these of-diagonal non-Gaussianity. We show that 
the rotation into this space allows for an efficient identification of 
non-Gaussian features from a only four survey fields. The method 
is completely general, and with a large enough sample, we always 
reproduces the full power spectrum covariance matrix. We finally 
describe a strategy to perform such measurements in the presence 
of survey selection functions. 
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APPENDIX A: BOOTSTRAP VS NOISE- WEIGHTED 
INTEGRATION 

This appendix describes a technique that improves the calculation 
of the uncertainty on NG(k, k') compared to the bootstrap approach. 
Recall that bootstrap combines the error from different angles in 
quadrature, even though the signal and noise strength vary for dif- 
ferent angles. To perform the noise-weighted angular integration, 
we first normalize each realization Cj(k,k',9) by the mean of the 
distribution: 



D i (k,k',9)±cr D = 



Cj(k,k',8) 
C(k,k',9) 



C(k,k',6) 



(Al) 



where ctq is the standard deviation in the sampling of C(k,k',8). 
As seen in Fig. IA1I the individual distributions of D t (k, k',9) are 
relatively flat, with a slight tilt towards smaller angles. The two 
panels in this figure correspond to scales k = k' = 2.1/?Mpc~' and 
k = kf = 0.31/?Mpc~' respectively. In addition, the error bars cr D 
get significantly smaller towards 9 = (or fj = 1). It is thus a good 
approximation to replace each fluctuation by its noise-weighted 
mean: 



D,(k, k',9) -» Dj(k, k') ^o-IYj 



Dj(k,k',9) 



The measurement of a matrix element d(k,k') from a given real- 
ization becomes: 

Ci(k, k') = Gj(k, k') + NGi(k, k') 

= Gj(k, k') + ^j k' , 9)w(9) 

= Gj(k, k') + V Di(k, k',9)C(k, k', 9)w(9) 

= GKk, k') + Dj(k, kf) ^ C(k, k',9)w(9) 

(A3) 

where we have used the substitution of [Eq. IA2I in the last step. 
Note that in the above expressions, <x CD depend on the variables 
(k, kl ', 9), while cr T depends on (k, k'). We have chosen not to write 
these dependencies explicitly in our equations to alleviate the no- 
tation. The mean value of C(k,k') computed with this method is 
identical to the bootstrap approach of [Eq.(3), since the realization 
average of Dj(k, k') is equal to unity by construction. However, this 
method has the direct advantage to reduce significantly the error on 
NG and, consequently, on C. 

We show in Fig. lA2l a comparison between the bootstrap sam- 
pling error bars on the C(k, k') matrix, and our proposed noise- 
weighted scheme. In the left panel, we hold kf = k, whereas in 
the right panel, we keep k = 0.628/iMpc and vary kf . The error bars 
achieved in the noise-weighted scheme are up to two orders of mag- 
nitude smaller than the bootstrap errors, and the estimate of the er- 
ror from the small sample is already accurate. We also observe that 
the bootstrap fractional error on C is scale invariant, whereas that 
from the noise-weighted method drops roughly as kr 2 and thereby 
yields much tighter constraints on the measured matrix elements. 
This comes from the fact that as we go to larger fc-modes, the signal 
becomes stronger for small angles, which improves the weighting. 
In both samples, by the time we have reach k = 0.1/?Mpc~', the im- 
provement is about an order of magnitude, and at k = l.OftMpc , 
the improvement is almost two orders of magnitude. With a frac- 
tional error that small, the covariance matrix is precisely measured 
even with a handful of realizations, a claim that bootstrap approach 
can not support. 
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Figure Al. (left.) Fluctuations in the angular dependence of the matter power spectrum covariance, for k = k' = 2.36AMpc , which corresponds to a scale 
of 2.7/r'Mpc. The error bars are the lo" standard deviation in the N = 200 samples. The dotted lines are the individual 200 realizations, while the thin solid 
lines represent the realizations from the N = 4 sample, (right:) Fluctuations for k = k' = 0.314/iMpc~' , which corresponds to a scale of 20.0/i _1 Mpc. The 
variations in the individual fluctuations are much stronger than in the non-linear regime, a result that approaches the expected behaviour of Gaussian random 
fields, where different measurements are less correlated. 




Figure A2. (top left.) Comparison between bootstrap and noise weighted estimates of the sampling uncertainty on the diagonal of the covariance matrix. We 
see that the improvement on the precision of this measurement is enhanced significantly in the second scheme, even when very few samples are available. The 
thick lines are obtained from 200 realizations, the thin lines from only 4. (bottom left.) Fractional uncertainty comparison, (top and bottom right.) Same as left 
column, for the case where one of the scale has been fixed to k = 0.628/iMpc~' . 



